Search CORE

7 research outputs found

Neural machine translation of literary texts from English to Slovene

Author: Arčan Mihael
Kuzman Taja
Vintar Špela
Publication venue: Machine Translation Summit 2019
Publication date: 26/08/2019
Field of study

Neural Machine Translation has shown promising performance in literary texts. Since literary machine translation has not yet been researched for the English-toSlovene translation direction, this paper aims to fulfill this gap by presenting a comparison among bespoke NMT models, tailored to novels, and Google Neural Machine Translation. The translation models were evaluated by the BLEU and METEOR metrics, assessment of fluency and adequacy, and measurement of the postediting effort. The findings show that all evaluated approaches resulted in an increase in translation productivity. The translation model tailored to a specific author outperformed the model trained on a more diverse literary corpus, based on all metrics except the scores for fluency. However, the translation model by Google still outperforms all bespoke models. The evaluation reveals a very low inter-rater agreement on fluency and adequacy, based on the kappa coefficient values, and significant discrepancies between posteditors. This suggests that these methods might not be reliable, which should be addressed in future studies.This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 (Insight), co-funded by the European Regional Development Fund.peer-reviewe

Irish Universities

Access to Research at National University of Ireland, Galway

Avtomatsko pridobivanje besednih zvez iz korpusa z uporabo leksikona SSJ

Author: Arhar Holdt Špela
Arčan Mihael
Publication venue: Znanstvena založba Filozofske fakultete
Publication date: 05/09/2023
Field of study

Računalniška leksikografija je meddisciplinarno področje, ki se osredotoča na avtomatizacijo leksikografskih postopkov in pripravo leksikalnih podatkovnih zbirk različnih vrst. V prispevku predstavljava postopek avtomatskega pridobivanja besednih zvez samostalnika z ujemalnim pridevniškim prilastkom iz besedilnega korpusa in avtomatsko pripravo izluščenih podatkov v ustrezni besednozvezni obliki z uporabo leksikona besednih oblik SSJ.The field of computational lexicography is an interdisciplinary field, primarily focusing on the automatisation of lexicographic procedures and the building of lexical databases of various kinds. In this paper we describe the automatic extraction of word phrases from a text corpus (phrases that contain adjectives that agree in gender, case, and number with the following noun) andthe transformation of extracted lexical data to a syntactically suitable final form by the means of the SSJ morphological lexicon

Repository of the University of Ljubljana

Post-edited and error annotated machine translation corpus PErr 1.0

Author: Arčan Mihael
Popović Maja
Publication venue: Insight Centre for Data Analytics, National University of Ireland, Galway
Publication date: 24/05/2016
Field of study

The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. The main advantage of the corpus is the fusion of post-editing and error classification tasks, which have usually been seen as two independent tasks, although naturally they are not

Common Language Resources and Technology Infrastructure - Slovenia

Back-translation approach for code-switching machine translation: A case study

Author: Arčan Mihael
Buitelaar Paul
Masoud Maraim
Torregrosa Daniel
Publication venue: AICS2019
Publication date: 13/12/2019
Field of study

Recently, machine translation has demonstrated significant progress in terms of translation quality. However, most of the research has focused on translating with pure monolingual texts in the source and the target side of the parallel corpora, when in fact code-switching is very common in communication nowadays. Despite the importance of handling code-switching in the translation task, existing machine translation systems fail to accommodate the code-switching content. In this paper, we examine the phenomenon of code-switching in machine translation for low-resource languages. Through different approaches, we evaluate the performance of our systems and make some observations about the role of code-mixing in the available corpora.This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under grant agreement number SFI/12/RC/2289_P2, co-funded by the European Regional Development Fund, and the Enterprise Ireland (EI) Innovation Partnership Programme under grant number IP20180729, NURS – Neural Machine Translation for Under-Resourced Scenarios.non-peer-reviewe

Irish Universities

Access to Research at National University of Ireland, Galway

Neural machine translation of literary texts from English to Slovene

Author: Arčan Mihael
Kuzman Taja
Vintar Špela
Publication venue: Machine Translation Summit 2019
Publication date: 04/09/2019
Field of study

Irish Universities

Back-translation approach for code-switching machine translation: A case study

Author: Arčan Mihael
Buitelaar Paul
Masoud Maraim
Torregrosa Daniel
Publication venue: AICS2019
Publication date: 13/12/2019
Field of study

Irish Universities